-
Notifications
You must be signed in to change notification settings - Fork 2.1k
fix: report correct reason in kube_pod_status_reason metric #2644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: report correct reason in kube_pod_status_reason metric #2644
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: carlosmorenokm1 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This issue is currently awaiting triage. If kube-state-metrics contributors determine this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Welcome @carlosmorenokm1! |
23f9138
to
518db3d
Compare
for _, cond := range p.Status.Conditions { | ||
if cond.Reason == reason { | ||
return 1 | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we only care about the last condition? If so, do we need to remove this part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's necessary to iterate through all the conditions because the reason may be in any of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it be a stale condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will not be a stale condition. Kubernetes regularly updates Pod conditions, so if a condition with the corresponding reason is found, it is assumed to be current. If a stale condition were detected, that would indicate an issue in Kubernetes, not in this logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will a pod have multiple different reasons?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, a Pod can have different “Reasons” throughout its lifecycle. Each event or change in the Pod’s state (for example, container creation, image pulling, runtime errors, restarts, etc.) can trigger a different reason. In Kubernetes, these “Reasons” are recorded at different points in the Pod’s lifecycle, so it is entirely possible for a single Pod to go through multiple different “Reasons” as it transitions between states.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was thinking the case where the pod status is failed to image, then runtime errors, then restart.
Will the above metric have all of these three status?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if your Pod transitions through those states (e.g., failed to pull image, runtime errors, then restarts), the metric can capture each corresponding reason at the time it occurs. However, you won’t necessarily see all reasons simultaneously; rather, you’ll see them reflected as changes in the metric over the Pod’s lifecycle.
Could we update the title to be "fix: report correct reason in kube_pod_status_reason metric" |
What this PR does / why we need it:
This PR updates the logic for generating the kube_pod_status_reason metric. Instead of only checking p.Status.Reason, the new implementation also verifies the pod conditions and the termination reasons of container statuses. This change fixes an issue where the metric always returned 0, even when a pod had a valid status reason (such as "Evicted", "NodeLost", etc.), leading to inaccurate monitoring data. Accurately reporting these values is crucial for diagnosing pod behavior and overall cluster health.
How does this change affect the cardinality of KSM:
It does not change the cardinality. The update only adjusts the value calculation for an existing metric family, so no new labels or metric series are introduced.
Which issue(s) this PR fixes:
Fixes #2612